Character encoding

نویسنده

  • Victor Eijkhout
چکیده

Have you ever wondered what goes on between the ‘A’ you hit on your keyboard, the ‘A’ stored in your file, and the ‘A’ that comes out of your printer? Why does that letter still come out of the printer if the file is printed by your friend in Egypt who doesn’t use the letter ‘A’? Maybe you know that ‘A’ is character 65 (decimal) in ASCII; if you put it on a web page, and it’s visited by someone in Japan, why don’t they get character number 65 in the Kanji alphabet? Do you remember the DOS days when your Mac owning colleague would send you a file and what were supposed to be accented characters would turn into smiley faces? Have you ever pasted text from MS-Word into Emacs, and Emacs wanted to save the document as UTF-8? Just what is that about? All this, and more, will be explained in this article.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Universal Multiple - Octet Coded Character Set International Organization for

responds to Maktari and Mansour's contribution and makes recommendations for a character set that could be accepted for encoding in the UCS.

متن کامل

Character Set and Language Encoding for Hypertext Transfer Protocol (HTTP) Header Field Parameters

By default, message header field parameters in Hypertext Transfer Protocol (HTTP) messages cannot carry characters outside the ISO8859-1 character set. RFC 2231 defines an encoding mechanism for use in Multipurpose Internet Mail Extensions (MIME) headers. This document specifies an encoding suitable for use in HTTP header fields that is compatible with a profile of the encoding defined in RFC 2...

متن کامل

Keyboard for inputting Chinese language

1.1 Technique of inputting Chinese character As the structure of Chinese characters are very different from the relatively simple alphabetic system of western languages, it is very difficult to input Chinese characters into computer quickly and conveniently. There are a few existing systems which include those based on the "PinYin" (phonetic) system, a combination of the PinYin system and chara...

متن کامل

Character encoding issues for web passwords

Password authentication remains ubiquitous on the web, primarily because of its low cost and compatibility with any device which allows a user to input text. Yet text is not universal. Computers must use a character encoding system to convert human-comprehensible writing into bits. We examine for the first time the lingering effects of character encoding on the password ecosystem. We report a n...

متن کامل

Internet Mail Consortium

The Unicode Standard [UNICODE], and ISO/IEC 10646 [ISO-10646] jointly define a character set (hereafter referred to as Unicode) which encompasses most of the world’s writing systems. UTF-16, the object of this specification, is an encoding scheme of this character set that has the characteristics of encoding the vast majority of currently-defined characters in exactly two octets and of being ab...

متن کامل

chared: Character Encoding Detection with a Known Language

chared is a system which can detect character encoding of a text document provided the language of the document is known. The system supports a wide range of languages and the most commonly used character encodings. We explain the details of the algorithm, describe the process of creating models for various languages and present results of an evaluation on a collection of Web pages.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008